Transformer Neural Networks (TNNs) have over the past couple of years begun to supplant Machine Learning model designs such as Recurrent Neural Networks for processing sequential data such as in language processing.
Recently, TNN use has expanded to image recognition as depicted in the diagram above, where:
- Attention - Attention mechanisms let a Machine Learning model relate tokens, such as image patches in this case, to each other regardless of their distance between one another in a group 
- Embedding - similar to Word Embedding, the process of mapping values into vector numbers 
- Linear Projection - Linear Vector Projection of one vector onto another 
- MLP - Multi Layer Perceptron feedforward ANN 
- Multi-Head Attention - Attention mechanism that relate tokens to each other regardless of their distance between one another in a group 
- Norm - Normalization of data 
- Transformer - a Transformer Neural Networks are non-recurrent models used for processing sequential 
Advancements such as image recognition TNNs are continuing the progress in improving Machine Learning model effectiveness and efficiency.

